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ABSTRACT 


Orchid flower as ornamental plants with a variety of types where one type of 
orchid has various characteristics in the form of different shapes and colors. 
Here, we chosen support vector machine (SVM), Naive Bayes, and k-nearest 
neighbor algorithm which generates text input. This system aims to assist the 
community in recognizing orchid plants based on their type. We used more 
than 2250 and 1500 images for training and testing respectively which 
consists of 15 types. Testing result shown impact analysis of comparison of 
three supervised algorithm using extraction or not and several variety 
distance. Here, we used SVM in Linear, Polynomial, and Gaussian kernel 
while k-nearest neighbor operated in distance starting from K1 until K11. 
Based on experimental results provide Linear kernel as best classifier and 
extraction process had been increase accuracy. Compared with Naive Bayes in 
66%, and a highest KNN in K=1 and d=1 is 98%, SVM had a better accuracy. 
SVM-GLCM-HSV better than SVM-HSV only that achieved 98.13% and 


93.06% respectively both in Linear kernel. On the other side, a combination of 
SVM-KNN yield highest accuracy better than selected algorithm here. 
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1. INTRODUCTION 

Orchid is the largest monocot plant with an estimated population of 25,000 species in the world. The 
orchid is a beautiful flower type and is included in the ornamental plants that are widely cultivated in 
Indonesia [1], besides the orchids are also produced as cut flowers. The uniqueness of orchids is in the shape 
and color [2] of the lips or labellum [1] that can distinguish with other plants. Dendrobium orchids [3] are 
able to meet the demands of interest consumers whose tastes change over time. This can be seen from the 
types of orchids on the market that have varied flower colors and shapes, as well as the presence of new 
varieties with a more beautiful and attractive appearance [4]. 

The process of comparing one type of orchid with another orchid can see the color, texture, and 
flower petals. Knowing these differences can make it easier to classify orchid types, but orchid plants have 
many similarities in flower petals, texture and color, so people find it difficult to distinguish orchid species, 
especially lay people who do not yet know the characteristics of orchid species, by therefore a computer 
system is needed for the automatic classification of flowers that is expected to make it easier to classify 
orchid types [3]. The classification process [5] is done by digital image processing technology. This 
information system will produce a level of accuracy that is accurate enough to determine the name and type 
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of orchid. With this application, people's insight into the names and types of orchids can be increased [6]. 
Extraction [7], [8] is needed to get a true and accurate information from the object of a smudge contained in 
the image. Classification is the process of finding a group of models or functions that describe and 
distinguish each class of data, to predict the class of objects or data whose class labels are unknown. 

In this paper, we present the results of a comparison of the accuracy of the supervisd learning 
algorithms, including support vector machine (SVM), k-nearest neighbor (KNN), and Naive Bayes in the 
unrealtime classification. This study will provide an overview for novice researchers in developing a 
classification process in a supervised learning algorithm. For best accuracy we have tried to apply kernel and 
range variations. We use the SVM, KNN and Naive Bayes (NB) algorithms which may be classified as quite 
popular but old school algorithms, in fact we have considered the operating time and parameter setting 
process using neural networks and their variations that take quite a long time, for example convolutional 
neural network (CNN), and multilayer perceptron (MLP). In fact, not everything will be good if it is 
processed with the latest algorithms. We also need to analyze the use of outdated algorithms, because there 
are still many things that we can gain from the results of these implementations. 


2. RESEARCH METHOD 
2.1. Related research 

Nilsback [9], has conducted a large-scale investigation of orchid images in 103 image classes using 
4 features, namely texture, boundary, petal, and color. In a single feature obtained an accuracy of 55.1% 
while in combination of all features obtained an accuracy of 72.8%. The challenge in accurately calculating 
accuracy is influenced by the similarity of large and small classes of each testing data input. SVM has also 
been used in [10], in this study 13 orchid genotypes were identified by FTIR spectroscopy with 3 models, 
namely SSAE, SVM, and KNN. SSAE is proven to be more accurate than SVM and KNN. SSAE produces 
99.4% accuracy and 97.9% calibration while KNN produces 100% accuracy but only 92.6% calibration. 
Paphiopedilum orchid species have been recognized using CNN in a study conducted by Arwatchananukul et 
al. [11] using 1500 images and 15 classes with an accuracy of 98.6%. Image classification also conducted 
using Naive Bayes as simple statistics and probabilities in shallot quality achieved high accuracy [12]. Here, 
Naive Bayes combine with hue saturation value (HSV) color model. Using 60 training and testing data, 
Naive Bayes produces 91.67% accuracy. Here, the choice of HSV color models and color channels is 
adjusted to the theory of human vision might change with another preprocessing to get highest accuracy. 
Jayech and Mabjoud compared with tree augmented Naive Bayes (TAN) and forest augmented Naive Bayes 
(FAN), regular Naive Bayes (RN) achieved highest mean classification [13]. A comparative study by K. 
Chandel et al. [14], proved that KNN is better than NB with accuracy 93.44% and 22.56% respectively. By 
H. T. Zaw et al. [15] conduct Naive Bayes to detect brain tumor. Another research by H. M. Zawbaa et al. 
[16] that applying SVM for flower image classification has been done. SVM has been tested and compared 
with random forest (RF) classifier using 215 flowers. By SIFT ad SFTA, the dataset was devided into 70% 
training and 30% tersting result shown that SIFT-SVM yield higher accuracy in 100% than SFTA-SVM. I. 
Mohamed et al. [17], a support vector machine is trained with different strategies according to the organs and 
species of plants using SIFT and OpponentColor SIFT. This experiment using big data in 5061 leaf uniform 
background images, 2107 leaf natural backgroungd images, and 2167 flower images. Berfore classify using 
SVM, images have been clustered using K-Means. Here, optimal paramater were K=4000 and C=100 has 
improved score from from 0.67 to 0.74. Another research in segmentation in fast and accurate detection of 
kiwifruit has been done by L. Fu et al. [18] and proved their model has small and efficient for real-time 
kiwifruit detection in the orchard. This experiment need more time to make hardware and high cost using R- 
CNN with ZFNet, Faster R-CNN with VGG16, YOLOv2 and YOLOv3-tiny, the DY3TNet model has 
achieved precision of 0.9005 in 27 MB data. Another research by A. Koirala et al. [19] using R-CNN in 
COCO dataset to detect mango fruit. They used of around 400 training tiles. The Mango YOLO(bu) achieved 
a Fl score of 0.89 on a day-time mango image dataset. Another detection using waxbery image has been 
done [20]. This research using COCO dataset and performed by MR-CNN and compared with K-Means in 
verification sample set, while the average detection accuracy and recall rate reaching 97% and 91%, 
respectively. Based on previous research and urgency of the classification of orchid images, we proposed 
SVM. It algorithm had been tested and investigated using three kernels are linear, polynomial, and gaussian. 
We had been investigated the result in accuracy both of using GLCM-HSV or only in HSV and also compare 
with KNN. We had been investigated the effect of using the extraction feature in achieving the best accuracy. 


2.2. Proposed method 

The preprocessing steps include cropping, resizing, and extracting the image. Cropping is an image 
processing process by cutting the image which aims to take an important part of the image, while resizing is 
the process of resizing an image. Feature extraction is the stage to recognize the characteristics or 
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information of objects in the image, while the feature is a form that is unique to the image. In this study, the 
extraction of orchid features using the gray level co-occurrence matrix (GLCM) method. Figure | is the stage 
that we have implemented to find the best accuracy. 


Orchid Cropping and Image 
dataset resizing separation 


RGB to HSV HSV image 


Figure 1. Proposed scheme 

















RGB 
Value 


Based on Figure 1, orchid data as many as 2250 original images for the entire data. Cropping the 
2250 orchid images, after that match all orchid image sizes to 512x512 pixels. From the 2250 data, it is 
grouped into 2 parts, namely training data and testing data, for training data there are 2250 original images 
and testing data for 1500 original images. After the data is grouped into 2 parts, each original image will look 
for the RGB value. After getting the RGB value do the conversion process to grayscale, then create a 
cohesion matrix with a distance of 1 and an angle of 0°. Finding the value of the four parameters, namely the 
value of contrast, energy, correlation, homogeneity using the MATLAB R2015a application. Calculate the 
GLCM using the values obtained from the calculation of the four previous parameters. Finding the HSV 
value for each orchid image by converting RGB to HSV. Then calculate the average value of hue, saturation 
and value. After getting the GLCM and HSV values, do the orchid flower classification process using the 
SVM method. The orchid flower classification process will produce a classification using the SVM method, 
with the value previously obtained in the extraction of the gray level GLCM and HSV features. When you 
get the classification results of orchids, calculate the level of accuracy. After that observe and record the 
results of the classification process of orchids. 


2.3. Hue saturation value 

HSV is a color extraction feature used for basic color classification and has a tolerance for changes 
in light intensity [21]. Here, RGB can be converted to HSV using some of calculation steps as in (1) until 
(10). Some of the advantages of HSV compared to other color spaces are : hue (H), which is a picture of the 
original color, such as blue, yellow, green, etc. that can be seen clearly by human vision. The angular values 
in HSV range from 0° to 360°; saturation (S) is the relative purity of the colors represented as the distance 
from the axis of black and white light with a value of 0 to 100; value (V) is represented as high on the black 
and white axis or the darkness of a color. Where R is red value before normalized, r is normalized red value, 
G is green value before normalized, g is normalized green value, B is blue value before normalized, b is 
normalized blue value, V is value, S is saturation, H is hue in (8) until (10) used to change the image to 8 bit 
image. The value range from value is O to 100, value O is black. Based on saturation, 100 as white color or 
more or less saturation level. 


R 








r= RIGHE) 0) 
G 

a (R+G+B) (2) 
B 

b= (R+G+B) (3) 
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2.4. Gray level co-occurrence matrix 

GLCM is feature extraction with texture calculations in the second order, whereas the second order 
text calculation is the relationship between pairs of two original image pixels [22]. The pixel neighbor can be 
selected eastward (right). The way to represent this relationship is (1,0), stating that the relationship of two 
pixels in a row forms a horizontal value of 1 and followed by a pixel of value 0. Based on this composition, 
the number of groups of pixels that meet the relationship is calculated. The following are the steps to 
calculate the GLCM, which are: First do the initial matrix formation of the GLCM from a pair of two pixels 
in the direction parallel to the direction 0°, 45°, 90°, or 135°. Second, forming a symmetric matrix by adding 
up the initial GLCM matrix with the value of the transpose matrix. And the next steps are normalize the 
GLCM matrix to eliminate dependence on image size by dividing each matrix element by the number of 
pixel pairs. Calculating the value of feature extraction in the GLCM method. As in (10) where i is the row 
value of the i” matrix, j is the value of the j” matrix column and p(i, j) is the value of the co-occurrence 
matrix element of rows (i) and column (j). As in (11) until (16), i is the matrix row value, j is the matrix 
column value, p(i, j) is the row (i) and column (j) co-occurrence matrix element value, ui, uj is the average 
value of the elements in the row and column matrix, ci, oj, is the standard deviation value for the rows and 
columns of the matrix. 1 jp 


CON = Lui- j’ Pij (10) 
H= Didj ipoj (11) 
H= did jjPaj (12) 


oi = Jiz- wi)? paj (13) 


oj = 2i ZG — yj)? Pay (14) 





E= Yij(Pi,j)’ (15) 
= p(ij) 
M= 2a I (16) 


2.5. Support vector machine 

SVM works very well on high-dimensional data sets [23]. This method uses a kernel technique that 
maps original data from the originating dimension to another relatively higher dimension [24]. In the NN 
method, the training process studies all training data, whereas SVM only studies selected data used in 
classification [25]. Unlike the k-nearest neighbor method, at the time of prediction it stores all the training 
data that will be used [26], but for SVM it stores a small portion of the training data to be used at the time of 
prediction as in (17). Where b is bias value, m=ampunt of support vector, and (x;). @(z) is kernel function. 
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For non-linear data you can use the kernel method in the initial data set feature [27]. The concept of kernel 
substitution can also be used in other methods in data analysis, but SVM is one of the well-known methods 
that uses the kernel to represent data [28], [29]. 


f(P(x)) = sign(w. pz) = b) = sign(@i=1 aiy; OC). pC) + b) (17) 


2.6. K-nearest neighbor 

The KNN algorithm is a classification method that works based on the proximity of data to other 
data, where the results of new query instances are classified based on the majority of the proximity of the 
existing categories in the KNN [10]. Euclidean distance is a distance search that is widely used in numeric 
data in the k-nearest neighbor algorithm technique by drawing a straight line from the training data point to 
the testing data point. According to Partiningsih et al. [30], Euclidean distance can adjust the order of the 
level of image similarity with good results, where d(x, y) is distance between data training and data testing, 
xi is data training, yi is data testing, / is data variable, and p data dimension. 


d(x,y) anes = Yi)" (18) 


2.7. Naive Bayes 

NB algorithm is supervised learning, which means it takes early to make decisions or predictions. 
The advantage of using the Naive Bayes algorithm is that it does not use numerical optimization, so it is 
cheaper matrix [31]. This algorithm is efficient in training and can use binary or polynomial data [13], [32]. 
Where P (X | Y) is the probability of the data with vector X in class Y, P (Y) is the initial probability of class 
X, P (Xi | Y) is the independent probability of class Y of the features in vector X, and P (X) is probability of 
X. For categorical data, it only requires all the possibilities that occur while continuous data can use the 
following methods [33] are: (1) calculate the probability (prior) of each class; (2) calculate the average 
(mean) of each feature as in (19), where k is the amount of data and n is data value; (3) compute the standard 
deviation of these features as in (20); (4) calculate the probability density as in (21); calculate the probability 
of each class as shown in (22). 


POIL P(AtY) 





P(X|Y) = St (19) 
n Xin (Xi-x)? 
sd = [Zata (20) 
1 kw? 
Puo (X) = Jeane = 20° (21) 
P = P(X|Ci)xP (Ci) (22) 


3. RESULTS AND DISCUSSION 

Here, we used 15 type of orchid taken from ImageNet namely Dendrobium, Brassavola, Cattleya, 
Cymbidium, Epidendrum, Vanda, Pleurothallis, Oncidium, (Calanthe, Coelogyne, Odontoglossum, 
Masdevallia, Laelia, Caladenia, Helleborine were selected where 100 of them as testing data in 512x512 
pixels. Sample of datasets for each type of orchid and sample of preprocessing shown in Figure 2 and 
Figure 3 respectively. Figure 2 can be used as knowledge about the value of the GLCM and HSV. Table 1 
shows the accuracy of the classification results with the SVM algorithm in linear, polynomial and gaussian 
kernels using and without GLCM HSV. We also compared our SVM with Naïve Bayes. Based on [3], 
accuracy (23) we has been tested all data. 


true value—analysis value 
Accuracy = |—————————_} x 100% (23) 


true value 
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(a) (b) 


Figure 2. Preprocessing stages, (a) RGB to HSV, (b) HSV to grayscale 
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Figure 3. These figures are; (a) representation of GLCM value, (b) representation of HSV value 


Here, we known that SVM better than Naive Bayes that only yield 66% with accuracy far below that 
of the SVM up to 85%. Based on the classification process that has been done it is known that the 
classification using the GLCM and HSV feature extraction on the SVM algorithm is the best classification 
when used to classify orchids. On the other side, due to provide another overview of the research we have 
made, we conducted another trial. We have tested the classification model by implementing K-Nearest 
Neighbor. Here we used KNN with HSV and GLCM as explain in Table 2. We intend to compare several 
supervised learning algorithms as an illustration to prove superiority to the SVM algorithm. In addition to 
making comparisons, in this paper we also present a combination of SVM and KNN with feature 
optimization or not. It can be seen that SVM KNN turns out to produce the highest accuracy, so it can be 
concluded that in fact SVM can classify well, but the combination of SVM KNN produces higher accuracy 
than SVM alone regardless of whether it uses features or not. According to Table 2 each K represent achieve 
different accuracy. It caused identification class by pixels distance. Here, we known that d=1 in K=1 produce 
a highest accuracy. We also tested our dataset using Naive Bayes. After this, we compared a best KNN result 
with SVM and Naive Bayes as spelled out in Table 1. 


Table 1. Comparison result between SVM and Naive Bayes 


SVM GLCM HSV SVM GLCM Naive Bayes 
Kernels Linear Polynomial Gaussian Linear Polynomial Gaussian 
Accuracy 98.13% 97.6% 94% 93.06% 87.86% 86.93% 66% 


Table 2. Comparison result between KNN in various values of distance (d) and class (K) 


Distance K Values 
Kl K3 K5 K7 K9 K11 
1 98% 94% 93% 86% 716% 70% 
4 93% 92% 86% 79% 74% 65% 
8 89% 88% 80% 72% 62% 61% 


SVM is a more reliable more of classifiers, however KNN is less computationally intensive than 
SVM [34]. SVM has better performance than KNN [10], whereas another research by J. Kim et al. [35] 
conclude that KNN better than SVM. Based on thus several research, we had been investigated the accuracy 
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of KNN and SVM. The results prove that SVM has superior performance compared to KNN which we have 
tested at values K1 to K11 and pixel distance=1, 4 or 8. For image data per class we use the same as the data 
used in SVM as shown in Table 1 and Table 3 where the KNN on K1 still produces lower accuracy than 
SVM. 


Table 3. Result Enhancement from a combination of SVM and KNN 


SVM KNN GLCM HSV SVM KNN GLCM 
Specification Linear Polynomial Gaussian Linear Polynomial Gaussian 
Accuracy 99.3% 98.6% 96.8% 94.93% 89% 88% 


Based on Table 3 and Figure 4, our proposed method produce highest accuracy compared with 
Naive Bayes and KNN. SVM-KNN proved better accuracy than SVM only. Features provide new knowledge 
which in fact can also improve the accuracy of the machine not significantly. It caused GLCM feature plays a 
role in identifying image texture, especially in orchid images that have curves and embossed lines such as 
Dendrobium, Vanda and Laelia with a slightly hairy element in each image. As shown in this Table 3, 
actually can be concluded that SVM-HSYV is better than Naive Bayes and several K in KNN. 





Results Comparison 
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Figure 4. A comparison between SVM, KNN, and NB in various ranges and kernels 


4. CONCLUSION 

Based on the tests we have done on 1500 orchid image data on 15 types of flowers using 3 
supervised learning algorithms, namely SVM, KNN and Naive Bayes. In the experiment using features, it is 
known that the KNN and SVM can be obtained higher than the data operated without features. If the 
algorithm is not combined, then SVM will produce better accuracy than Naive Bayes both in using features 
and not. On the other hand, SVM without features is higher than some KNN results with features on certain 
K, for example at K 1 to K5 values at low d, it has higher accuracy than SVM GLCM on Gaussian kernel. By 
combining SVM and KNN using either features or not, the accuracy value increases, but in our experiments 
we haven't been able to produce 100% accuracy. Another finding in our tests is that linear kernels are most 
suitable for classification processes where the results are better than polynomial or gaussian kernels. This is 
our challenge to improve accuracy so that it is maximized. SVM may be combined with other algorithms or 
feature extraction, for example with linear binary processing (LBP) to get a better feature value than GLCM. 
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